AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multi-frame Analysis

# Multi-frame Analysis

Cogvlm2 Llama3 Caption
Other
CogVLM2-Caption is a video caption generation model used to generate training data for the CogVideoX model.
Video-to-Text Transformers English
C
THUDM
7,493
95
Vivit B 16x2 Kinetics400
MIT
ViViT is an extension of the Vision Transformer (ViT) for video processing, particularly suitable for video classification tasks.
Video Processing Transformers
V
google
56.94k
32
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase